Descriptor collision and confusion: Toward the design of descriptors to mask chemical structures

نویسندگان

  • Cristian Bologa
  • Tharun Kumar Allu
  • Marius Olah
  • Michael A. Kappler
  • Tudor I. Oprea
چکیده

We examined "descriptor collision" for several chemical fingerprint systems (MDL 320, Daylight, SMDL), and for a 2D-based descriptor set. For large databases (ChemNavigator and WOMBAT), the smallest collision rate remains around 5%. We systematically increase the "descriptor collision" rate (here termed "descriptor confusion"), in order to design a set of "descriptors to mask chemical structures", DMCS. If effective, a DMCS system would not allow third parties to determine the original chemical structures used to derive the DMCS set (i.e., reverse engineering). Using SMDL keys, the "confusion" rate is increased to 45.6% by eliminating those keys that have a low frequency of occurrence in WOMBAT structures. We applied an automated PLS engine, WB-PLS [Olah et al., J. Comput. Aided Mol. Des., 18 (2004) 437], to 1277 series of structures from 948 targets in WOMBAT, in order to validate the biological relevance of the SMDL descriptors as a potential DMCS set. The "reduced set" of SMDL descriptors has a small loss of modeling power (around 20%) compared to the initial descriptor set, while the collision rate is significantly increased. These results indicate that the development of an effective DMCS is possible. If well documented, DMCS systems would encourage private sector data release (e.g., related to water solubility) and directly benefit public sector science.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

QSAR models to predict physico-chemical Properties of some barbiturate derivatives using molecular descriptors and genetic algorithm- multiple linear regressions

In this study the relationship between choosing appropriate descriptors by genetic algorithm to the Polarizability (POL), Molar Refractivity (MR) and Octanol/water Partition Coefficient (LogP) of barbiturates is studied. The chemical structures of the molecules were optimized using ab initio 6-31G basis set method and Polak-Ribiere algorithm with conjugated gradient within HyperChem 8.0 environ...

متن کامل

بازیابی مبتنی بر شکل اجسام با توصیفگرهای بدست آمده از فرآیند رشد کانتوری

In this paper, a novel shape descriptor for shape-based object retrieval is proposed. A growing process is introduced in which a contour is reconstructed from the bounding circle of the shape. In this growing process, circle points move toward the shape in normal direction until they  get to the shape contour. Three different shape descriptors are extracted from this process: the first descript...

متن کامل

A Novel Molecular Descriptor Derived from Weighted Line Graph

The Bertz indices, derived by counting the number of connecting edges of line graphs of a molecule were used in deriving the QSPR models for the physicochemical properties of alkanes. The inability of these indices to identify the hetero centre in a chemical compound restricted their applications to hydrocarbons only. In the present work, a novel molecular descriptor has been derived from the w...

متن کامل

GPS Jamming Detection in UAV Navigation Using Visual Odometry and HOD Trajectory Descriptor

Auto-navigating of unmanned aerial vehicles (UAV) in the outdoor environment is performed by using the Global positioning system (GPS) receiver. The power of the GPS signal on the earth surface is very low. This can affect the performance of GPS receivers in the environments contaminated with the other source of radio frequency interference (RFI). GPS jamming and spoofing are the most serious a...

متن کامل

Quantitative Structure-Pproperty Relationship Modeling of the Redox Potential for Some Phenolic Antioxidants

In this work, quantitative structure-property relationship (QSPR) approaches were used to predict the redox potential of 42 phenolic antioxidants. The structures of all compounds optimized by the AM1 semi-empirical method and then a large number of molecular descriptors were calculated for each compound in the data set. Subsequently, stepwise multilinear regression was applied to select the mos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of computer-aided molecular design

دوره 19 9-10  شماره 

صفحات  -

تاریخ انتشار 2005